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^' ■ i PREFACE 

» ' * » 

Lh part of our goal at CSE has been to develop new 'and improved 
psychometric techniques to study, develop and characterize achievement 
tests amd achievement test items.' Recently our efforts have been focused 
on certain errors that o^ccur when using criterion-referenced tests. 
In particular, we have investigated problems related to estimating 
and controlling the false-positive and false-negative error rates asso- 
ciated jwith a test and a population of examinees. \x/ other words, 
we are concerned about passing th9se examinees who should pass, and re- 
taining those examinees wh<5 need remedial work. This paper deals with 
one aspect of that problem. 



■ 'IP ABSTRACT. . - . . 

; Wi l<:ox (1977) examines two^methods of estimating the probability of 
a false-positive on false^negative decision with a mastery test. Both 
procedures make assumptions about the fbrm of tha true score distrilMition 
which might not give good results in all situations. In this paper, . 
upper and lower bounds on the two possible error types are described 
which make no assumption, about the form of the true score d^istribution. 
1 11 lustrations are given on how these bounds might be used to determine ' 
the^length of the test. 



Introduction 

N ' Recently, Wilcox tl977) considered two methods of 'estimating' the \^ 

probability of making a false-positive or false-negative-deci'sion witha 

mastery •test.-'' Both of these procedures make an assumption about the 

form of the distribution of true scores over the population of examinees. 

• In thisiraper, upper ind Jower bounds to these probabilities are described 

r 

which make no assumption about »the true score distribution beyond^ that . 
itV first two moments exist. We begin by stati-ng explicitly the model 
that will be used to describe a mastery test after which we consider 
briefly the importance of false-positive and false-negative decisions 
relative to the 'other prroposed methods of characterizii^suph tests. 

^ « 

1. The Model- 



Consistent with Hambleton and Novick (1973), Harris (1974), Novick 

\ ' t 

and tewis '(1974), Hi^nh (plS), Fhaner (1974),^and Wilcox (1977), we 

/ , 

may describe a .'mastery test as follows: An instructional program is 
developed with the goal of fostering certain specific skills in the 
students taking. the course. For each skill area, a domain of test items 
is constructed. A total of n j terns is randomly sampled from this domain 
and administered to an examinee for the pi^rpose of determining whether*^ 
the examinee's true score, say is above or below the known •criter,ion 
score 5q. If ?<i;Q the examinee is a master acid he/she is advanced to 
the next level of instruction; otherwise, the examinee is given remedial 
work. The decision ?<Cq is made if, and only if, x >Xq where Xq is ^ome 
appropriately chosen parsing score and where x is the examinee's number 
correct observed scor^. Note that the choice for the passing score Xq 
may be made jn accordance with the "losses" associated .withjihe prob- 
ability of a false-positive, or fatse- negative decision (e.g., Hambleton 



ErJc ^"^ Novick, 1973) 



For this model of a, mastery test, there are two possible errors. 
The- first is a. false-positive error whi.ch occurs when )^Xq and ?<Cq. A** 
false-negative error* occurs \^hen x<XQ-and CICq. 

Let a = 9r{y?x^, ^<^q) and 6='Pr(x<XQ, ^-^o\* ^^^^^^'^ paper ^ and 
are defined in terms of a group of individuals. In particular, g(^), 
the distributiort of c, is the probability density function of true 
scores over a population of examinees. This is in contract to the 
Bayesian app^ach where g(^) is the prior distribution for a specific^ 
examinee. (See, e.g., Novick and Lewis, 1974.) 

As mentioned earlo'er, WiTcox (1977) ^describes two methods of 'esti- 
mating a and 0 both of which assume :,that the distribution of ^ over the 

population of e^caminees ha^s a paV'ticular parelfnetric form.' the first 

* • " , ' >' 

estimation procedure assumes that the conditional distribution of ^ 

/ ■ ■ 

observed scores for a singT.e examinee is given by . ^ 

f(xio =■(") a-?)""\ . ' (i.i) 

the binomial probability function, and, that the distribution of ? is 

3fe)V:T^&^^"'(i-^/"^ - - . (1.2)' 

the beta distribution with parameters r>0 and s>0. For*n>10 it. appears 
ttfat this estimatiba procedure gives fairly good results even when 
bserva?T*ons are generated according to a two-term approximation to the 
pmpound bionomial distribution. The same is also true for the bth^r 
stimatioilii procedure which uses an arc-sine transformation ort the 
^served score of an examinee and which assumes a normal prior 



-The assumption that c has a beta distribution deserves serious ' 
consideration since there is evidence that the resulting beta-binomial 
(or negative hypergeometric) probability model may give a good fit to 
data (Keats and Lord, 1962; Lord, 1965), One difficulty with this model 
is that, with the exception of U-shaped distributions, the^distritution 
of tT*ue scores can have at most* one mode. Thus, it is not at all clear 
whether the -beta-binomial model will yield reasonably accurate values 
for a and_ g in every case. 

One possible solution to this problem is to consicfer some other 
method of estimating the true score distribution.* (See, e.g., Maritz, f 
1967; Lord, 1969.) However, the robustness of these alternate models in 
^terms of estimating a and 6 is'uijiknown and difficult to ascertain. 

Another possibility is to use some coefficient that reflects 
indirectly the values of a and 6 but which maizes no assumption about the 
form of g(?). For example, one might use the proportion of agreement 
(Hambleton^and Novick, 1973) or Cohen's Kappa (Swaminathan, Hambleton 
and Algina, 1974). Several* other coefficients have been proposed as 
wen (Harris, 1974; Livingston, 1972; Brennan and Kane, 1977). In terms 
of a and 8 , all -of these coefficients present at least two- problems. 
First, the exact relationship pf a and 6 to these other indexes- is 
unknown. Second, nphe^df these other indexes makes -a distinction between 
false-positive and false-negative decisions. This latter problem is 
particularly troublesome" since the seriousness of a false-positi v^ 
decision may not be the same as the seriousness, of a false-negative 
"^decision .which, in turn, may have an effect on the decision rule used to 
de^rmine whether C is above or below ^-^An illustration of this point- 
arises in the situation considered by Hambleton and Novick (1973), 
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Novicjc gnd Lewis (1974) and Huynh'(1976) in which constant losses are 
associated with the two possible errors. Thus, -we let the constants c-, 
and C2"represent the "cost" of a false-positive and -false-negative 
decision, respectively. Within this framework a natural cho,ide for the 
passing' score Xq is the one which minimizes 

c^ct Y , . • ' ^ ' . (1.3) 

the Bayes-^risk. An index such as' the proportion of agreeitient, is of. 
little 'help in the search for an optimal passing score since we can - 
guarantee th^ its maximum value of one will be attained simply By' 
passing (orfailingj; every examinee. 'This is not tS say the indexes 
such as the proportion of agreement or Cphen's Kappa have Jg^tle or no . 
value. Indeed, these indexes are important sr^nce, at a minimum, we want 
to make consistent decisions across comparable mastery tests. 'The 
advantage of a and b is- that they provide a direct indication of how 
certain we can be that a correct decision ii being made^^when trying to 
decid^ whether ; is above or below For still more illustrations of 
this point, the reader is referred to Huyhh (1976), Van der Linden and 
Mellehbergh (1977) and Wilcox (1977): : 



* Given that it is desirable to know the\vaJues of a and 3 , it is 

I 



-natural to want to know whether their -vslue ^s smal 1> regardless of the 
actual form of the tr.ue score distributi6n. With this goal in mind, we 
consider si tuatjons. which yield upper and lower bounds for both a and S 



but which make no assumption about the*fo1rm-of g(?). 



. " _ ^ * ; * 2. * An Upper Bound as a/Fuhction of n 

Before describing our' main results, we. note that, an upper bound to 
a and 6 is readily* derived when the binofeicil error model (Lord .and 
NoyiciC 1968,' Chapter 23) is assumed to hold. In othef words, we are 
assuming that the conditional distribution of observed scores for"^an 
examinee is given by expression (1.1). , From Wilcox (in p res?) .it 



' follows* immedrately that , ^ K^'^ 

n . • ' * ' 

:/ . a < E f(x|5 = ^q) ^ r ■ (2^1) 



X=Xq 



and 



~ x=0 ^ C 

We observe that»from a theoretical 'point of view, the assumption^ 
that f(x|0 is a binoiwal probability functiQ/i Has been criticized by> 
"several writers when an item sampling model ^^pplies (Ham|)leton et>al., 
1978; Lord and Novick, 1968, Chapter 23; Lord^^ 1965; Meredith and Kearns, 
1973). The binomial error. modervould seem toxdeserve serious considera- 
tion in practice since even more restrictive moojBls.give a good fit to 
_data (Keats and Lord, 1962; Lord, 1965). Nevertheless, one might prefer 
a more general prdtJ&fedlity function for describing the conditional 
distribution of observed scores. Lord (1965) as well as Lord and Novick 

si ' 
(1968, Chapter 23) suggest that a two-term approximation to the compound 

blononifa'l be used. Results reported by Wilcox (in press) suggest that 

f * 

;• ' « 

when this more general probability fuQction is adopted, the "intuitively 
. obvfous" upjper bounds to a-and e given by expressions (2.1) and (2.2) 
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will still hold. However, a rigorous proof tijat this is the casfe^hemains 
to.be found. , ' 

3. Upper and Lower Bounds to a and 8-^ That are a Function^ ' ' 
' of the First Two Moments of the True Score Distribution . 

Is it possible to improve upah the upper bounds on a and 3 given^by 
expressions (2,1)" and (2.2) without making any assumption about the form 
of. g(c)?' In many cases, the answer is yes. 
For notational convenience we let: 
• =7 the event x*>^.'Xq ^ • * 

= the event x<Xq' ^ ^ ^ 

A2 = the event ? 1 Cq - • 

= the event ^ ^ ^q' ' 
The intersection of tvo events is denoted by the juxtaposition of the^ 
corresponding symbols. Thus, A^A2 represents the event x 1 Xq and 
? 1 i-e., a correct-positive decision. We begin by deriving lower 
bounds, to Pr(A-^A^) and •Pr(AjA2). 

*Let li cind 0^ represent the mean and variance of the true scores of 
the examinees. In practice u and a*- are unknown; however, they may be 

estimated as folTows: Let . . ,x. :be the_pbs^v!ed scores of k randomly 
! IK- ^. 

selected examinees taking an n-item test. For the binomial error m6del 
(Lord- and Novick, 1968, Chapter 23) " . " % 

V 

« • * 

- — » • 

u = (kn)"-^5: X. - •** . ■ 

- 2 " - 2 



11 



may be used as an estimate of u anda where . ' 

' £kn(n-i)]^.i:(xf'X.}. ' . - - . . ■ 

If a two-term approximation to 'tije compounci bionomial jlistribution is 

. 2 ' 

preferred, we still use a to estimate* p but we replacea with 

^ ^ =c'^- (n-2d) p (l-;)/cJ(n-l)+2d] . 
where. ' . * ' * 



2 ~ ■ ' f ' 2 • 

r and u are the variance and mean of observed scores and where a^^ is 

the varrance of the item difficulties. 



Let 



J 0 ^ 0 



and' 



2 9 . 



U = 



\ 



(y (l-y)-a2)/(i-?Q)?Q, o1:herwise • . 

where ' * . ■ , 

-m = max {m(?j_p)., .(ii-?Q)(l-M)} ^ . ^ (3.3) 



It foTJows, frpni results' .gix|f\/by Skibinsky (1977). that > ' ' 

■ : > • ■■ •(•■ ( ■ 

^ From the flonferronf inequality ^is;, e.g. , Miller, 1966, p. 8), * 



. -PrCAjAp > 1 - PV-CA^) - PrCAg)" . . . * v , • (3.5)- 

' whichV together with (3.4) implies that ^ . . 

Pr(A?AJ) > 1'- U - Pr(A^). " ' " ^' , ^, ■ (3-6) 



' 'The proportion- of examinees^ parsing the tBSt serves' as- an estimate of 

• Pr(A-|). ^ Thus, we have an estimate of a lower bound on PrCAirAi;); 
1^ ^\ In some' case^the Jower b6und to Pr(A^A|) will be close to one 

which implies that both a and B are relat'ively sma,ll, since both are^^less 
than or equpil to X - Pr(AfAS). In particular, (3.6) implies that 

A lower bound on Pr(A^A2) can also be derived' by replacing-- ^ .in , 
(3.2) with . • ' ■ ' ' ' • ■ 

I 

The^ resulting valiie of U, say U^.'is such that" ' ^ . ' ' 
' '!Pr(Ab< U^." ■. ,' ' 'S\ ■ 

Thus, 

iP''^^^) ll; ^1 - Pr(Aj). 



An upper iiound for both <x and 6 is readily derived* as follows: 



C . y^'^^^ ' r 

Since^A^Ag is a subset af both A^ and Ag, 



. a < Pr(Ap-,^ < PrCAg)., ^nd so ' ' ; • , . 

' . a < . min[Pr(A^), U^]. . , ^ " . ' (3.8> 

Similarly,. * • . . 

* \ ' ^ ' 

■ 6 iniin[Pr(A^), U] ' ' *' (3.9) 

We conclude this section by describing lower bounds on both 
^ncl' 6. For t we have '"that 

a = PrCA/g) • ^ • 

1 1 - Pr(A^^ - PKAg)' ' ' * . ■ 

^ >^ 1 - U - PrCAj). ' ' • (3.10) 

for similarreasons • ^ * 

' a > 1 - - Pr(A^). . * (3.^) 



m ■ 

An Upper Bound to a and e Assuming That- 



'/ The Binomial Error, Model Holds 

In the previous section we descr.ibfed upper and Igwer bounds to ^ 
andg which depend only on odr ai)ility*to estimate the first and second*^ - 
moinents of the true score distribution. As ^notpd above, sucn estimates 
aVe readily avail able when the conditional distribution of observed / 
scores for any exaininee is given by a two- item, approximation to. a com- 
pound binomial distribution. As' shown by Rutherford 'and Krutchkoff 
(1967) such estimaftes are also available for a wide variety of 
situations. 



In this section we indicate' how .the inequalities (3., 10) and (3.11) 
illight be improved upon when the binomia}^ error model is assumed to hold. 
Since 



=Pr(ApPr(A^|AC) 



it follows that 
n 



a < U • Z f(xl?<?o). ! ' (4.1) • 

From known properties about the binomial probability fynction (see ^ 
Wilcox, in press; Fhaner, 1974) which can be derived from results given 
by l^hmann (1959, Chapter^), we have that 

2 f(xU<?o)^l 2 (J) ^'(xU=?o)- ' ■ • 
x=Xq x=Xo ■■ - , ■ 

Hence, ' ' ^ . ' 

? (J) ^.^ (■i-^j'"-'^ 
<0 



^ x=xn ^ 0 ' 0 . . (4 2) 



For similar reasons, it can.be seen that . • 

e lU I m (l-?o') ""^ ' ' (4.3) 

* x-0 ^ ° - 

It was suggested to the author that a theorem by Markov (recently 
applied by Lord and Stocking, 1976) might be applied to obtain bounds on 
and. It should be pointed out^ however, that the conditions of this 
theorem, as described by Lord and Cressie (1975),. are not satisfied in 
genferal. ' To see this, it is sufficient to observe the first deirivativp^ 

■ ■ • "/ ■ 

. ■ ■ 15 



of h/fe) ='E° (5> (-l-?)""^' with respect to t, is negative. 
' x='0 • .' . ■ ' 



The" derivative of hgCc) =. X-h^(?) is positive, but the'secon4 derivative 
can b'e hegativeT^See-, however,. Karl in and Shapley, 1953.) 

■ ' ■ '-v.: ■ 

, . 5. Another ApplicSLtion ' > 

As an illustration and another application on how the upper bounds 
to q, and 3 might used, we consider/ the pjnoblem of deterfnioing how 
many items to include on a rtastery test. For technical reasons (Fhan^r, 
19M; Wilcox, in press) it ^s necessary to formulate the above model of 
mastery testing in a slightly different fashion- In addition to the 
.dri teri on , score Cq, we specify Ihe constants and where Z'^Zq<j;2' 
If 5^<C<?2.^ say that the examinee is classified correctly with prob- , 
ability orfe since there is negligible loss if a misclassification is 
made. However, if<;<c, or , we want td be reasonably certain that 
a x:orrect decision is made. More specifically, we want to choose n, the; 
test l^n^thT so Jthat the probability of both a false-positive and false- 

negative decision, is reasonably small. We specify .this criterion by 

• " • ' ■ ■ ^ \ 

•requiring •• ^. 

• o < o* 

- . - . . (5.1) 

and / , , ■ . 

■ 0 1"B* ^ V : • \ - " (5.2) 

'• • • 

where a * and 3 * arfe given constants. For this model of a mastery test 
we-'now have that a ^Pr()i >^Xq', Z < z i) ^nd g =Pr(x < Xq, ? 1 ? 
If 5q.= Z-^- it may be impossible to choose n so that (5.1) and 
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(5.2) Ire satisfied. The solution given by Fhaner (1974) -is to choose 
,the smallest n so that simultaneously 



and 



n 

E . 

n=x. 



1 ^ 



.n-x 



.(5.3) 



(5.4) 



*< Fgr the sake of illustration, suppose a*=.12, 3*=. 04, ?^=.7, ?2~-^', 
y=. 945 .and a =.003. To determine an appropriate , teat length we first * 
compute upper bounds to a and$. Since false-positive and false-negative 
decisions are now defined in terms of and ?2-'^3then than (;q> the •/ 
expressions for andjn^re no longer appropriate.' ' To determine an 

upper bound on ?r(^t^)^ we now u^e . ' , J:- 

m = max.[y(?2-^). (^-^^^(l-i^')] " ^ | 

- -' 

\ • t ' 

. and <we replace U with - - T , 




a ' 2 

2 r; — r^. , if. 0 < o**^ _< m 



(y(l-y)-o')/((l'?2)4). otherwisfe. 



In our exampl-e, mF.0025, ?«=.945 and the resulting value;;of U is 
.544. If we assume that the binomial error mode-l^holds, we inay apply 
the arguments of the previous section which imply that 
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4 

x=0 . .. 



As for Or ^ft use 



^3 = 6. 



max tMCV*'),- (m-Ci)(1-u)] 

and* the upper bound to a is 
2 

5- if 0 < o2 im 



(M(l-M)-a^)/((l-r )c ), Otherwise. 
•1 1 



r ■ ^ 

For the case at hand, nF..0135 and ^-fJ. T|||J resulting value of U| 



>s .0476 and so 



<ui'" (J) ci-y""''' . ' (5.6) 



x=Xo 

= .048 .E (") .1"^ .3"""^. 
.^>^0 



We evaluated these upper' bounds for increa|4^. values of n with Xq 
chosen to be the smallest integer such that •XQ/n>5Q. F6^this particular 
cas«, the smallest value •of n -required so that both (5.1) and (5. 2)" are 
saU*fied la n=10 with Xq=8. If inequalities (5.3) and (5.4) xather 
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than (5.5) and (5.6) are* used, we-require n=25. Thus, we are able, to 

justify a sgbstantially shorter test than -0=25 without rnaking any 

* assumption about the form of the true score distribution. 

As a final illustration, we analyze some test data reported by 

Huynh (1976)*. A five^item arithmetic tesWas administered to 91 

students whose test scores x=0, 1,' 2, 3, 4 and 5 have frequencies 4, 14,^ 

9, 17, 21, 26, respectively./ The mean and'variance of the true /SCore 

* 2 

distribution are estimated to be 1^=^.653 and a =.065- The resulting 
value of U*' is .52. Also, U^=.77. Thus, letting ^g, 5^, r^^ retain the 
same values as before, • 



X -1 

CT ^0 r»x -.n-x 

6 < .51 E (y) .9 .1 

x=0 ^ 



and , ^ 

a < .76 E. (") .7^ .3""^. 



. Setting a*=.l? and B*=.09,^the minimum required test length is n=19 with 
Xo=16. ' ^ / . 



Concluding Remarks % . 
In sumnlary, we have indicated methods ^of obtaining upper and Tower 
bounds to. both a and 8 which make no assumptions about the form of the 

i 

true, score distribution.^ The first method depends only on oJr ability 
to determine the mean and variance of the true score distributions As 
indicated aDove^ such estimates are readily available when- the binomial 
or compound binomial error model is assumed. The second method is based 
on the binomial error model which is frequently used to describe a 

r 

• 19 . ' ^ 



mastery test. As was illustrated, the resulting upper bounds may be 
particularly useful when determining the length of 'the test. 

• / 
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